Synthetic Data

نویسندگان

چکیده

Demand for access to data, especially data collected using public funds, is ever growing. At the same time, concerns about disclosure of identities and sensitive information respondents providing are making collectors limit data. Synthetic sets, generated emulate certain key found in actual provide ability draw valid statistical inferences, an attractive framework afford widespread analysis while mitigating privacy confidentiality concerns. The goal this article a review various approaches generating analyzing synthetic inferential justification, limitations approaches, directions future research.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

How Protective Are Synthetic Data?

This short paper provides a synthesis of the statistical disclosure limitation and computer science data privacy approaches to measuring the confidentiality protections provided by fully synthetic data. Since all elements of the data records in the release file derived from fully synthetic data are sampled from an appropriate probability distribution, they do not represent “real data,” but ther...

متن کامل

Synthetic Data for Social Good

Data scientists need access to data to do social good. But data owners must be conservative about how, when, and why they share data or risk violating the trust of the people they aim to help, losing their funding, or breaking the law. Data sharing agreements can help prevent privacy violations, but require a level of specificity that is premature during preliminary discussions, and can take ov...

متن کامل

New Shewhart-type synthetic bar{X} control schemes for non-normal data

In this paper, Burr-type XII ̄X synthetic schemes are proposed as an alternative to the classical ̄X synthetic schemes when the assumption of normality fails to hold. First, the basic design of the Burr-type XII ̄X synthetic scheme is developed and its performance investigated using exact formulae. Secondly, the non-side-sensitive and side-sensitive Burr-type XII ̄X synthetic schemes are int...

متن کامل

On Regression-Tree-Based Synthetic Data Methods for Business Data

The challenge of balancing the competing objectives of allowing statistical analysis of confidential data and maintaining confidentiality is of great interest to national statistical agencies and other data custodians seeking to make their data available for research. This balance is often characterised as a trade-off between disclosure risk and data utility, where disclosure risk attempts to c...

متن کامل

Synthetic Data Generation using Benerator Tool

Datasets of different characteristics are needed by the research community for experimental purposes. However, real data may be difficult to obtain due to privacy concerns. Moreover, real data may not meet specific characteristics which are needed to verify new approaches under certain conditions. Given these limitations, the use of synthetic data is a viable alternative to complement the real ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Annual review of statistics and its application

سال: 2021

ISSN: ['2326-8298', '2326-831X']

DOI: https://doi.org/10.1146/annurev-statistics-040720-031848